A Language-Independent Anaphora Resolution System for Understanding Multilingual Texts

نویسندگان

  • Chinatsu Aone
  • Douglas McKee
چکیده

This paper describes a new discourse module within our multilingual NLP system. Because of its unique data-driven architecture, the discourse module is language-independent. Moreover, the use of hierarchically organized multiple knowledge sources makes the module robust and trainable using discourse-tagged corpora. Separating discourse phenomena from knowledge sources makes the discourse module easily extensible to additional phenomena. 1 I n t r o d u c t i o n This paper describes a new discourse module within our multilingual natural language processing system which has been used for understanding texts in English, Spanish and Japanese (el. [1, 2])) The following design principles underlie the discourse module: • Language-independence: No processing code depends on language-dependent facts. • Extensibility: It is easy to handle additional phenomena. • Robustness: The discourse module does its best even when its input is incomplete or wrong. • Trainability: The performance can be tuned for particular domains and applications. In the following, we first describe the architecture of the discourse module. Then, we discuss how its performance is evaluated and trained using discoursetagged corpora. Finally, we compare our approach to other research. 1 Our s y s t e m has been used in severa l d a t a e x t r a c t i o n t a sks a n d a p r o t o t y p e n lach ine t r a n s l a t i o n sys te ln . p e r f o . m . . . . ~ n t i ~ u 2 k c $ ~ " e d v . . . . . . . . . . . . . . . . . . . . . . . . . . r . . . . . . . . . . . . . . . . . . . . o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , l : ) i ~ ~ M o d u l e Figure 1: Discourse Architecture 2 D i s c o u r s e A r c h i t e c t u r e Our discourse module consists of two discourse processing submodules (the Discourse A dministralor and the Resolution Engine), and three discourse knowledge bases (the Discourse Knowledge Source KB, the Discourse Phenomenon KB, and the Discourse Domain KB). The Discourse Administrator is a development-time tool for defining the three discourse KB's. The Resolution Engine, on the other hand, is the run-time processing module which actually performs anaphora resolution using these discourse KB's. The Resolution Engine also has access to an external discourse data structure called the global discourse world, which is created by the top-level text processing controller. The global discourse world holds syntactic, semantic, rhetorical, and other information about the input text derived by other parts of the system. The architecture is shown in Figure i. 2.1 D i s c o u r s e D a t a S t r u c t u r e s There are four major discourse data types within the global discourse world: Discourse World (DW), [)is-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual Pronoun Resolution: Experiments in English and French

Anaphora resolution has been a subject of research in computational linguistics for more than 25 years. The interest it aroused was due to the importance that anaphoric phenomena play in the coherence and cohesiveness of natural language. A deep understanding of a text is impossible without knowledge about how individual concepts relate to each other; a shallow understanding of a text is often ...

متن کامل

The Saara Framework: An Anaphora Resolution System for Czech

Determining reference and referential links in discourse is one of the biggest and most important challenges in natural language understanding. In particular, computing coreference classes over the set of referring expressions in text is crucial for its further syntactic and semantic processing. We present a system for automatic anaphora resolution that can be used on arbitrary texts in Czech. ...

متن کامل

Cooperation between Pronoun and Reference Resolution for Unrestricted Texts

Anaphora resolution is envisaged in this paper as part of the reference resolution process. A general open architecture is proposed, which can be particularized and configured in order to simulate some classic anaphora resolution methods. With the aim of improving pronoun resolution, the system takes advantage of elementary cues about characters of the text, which are represented through a part...

متن کامل

ZAC: Zero Anaphora Corpus A Corpus for Zero Anaphora Resolution in Portuguese

This paper describes a corpus of Brazilian Portuguese texts built in view of the construction of an Anaphora Resolution system, which is part of a fully-fledged Natural Language Processing system (STRING). The ZAC corpus is aimed at the resolution of the so-called zero-anaphora, that is, an anaphora relation where the anaphoric expression (or anaphor) has been zeroed The paper briefly discusses...

متن کامل

Pronominal Anaphora Resolution in the KANTOO Multilingual Machine Translation System

We present an approach to pronominal anaphora resolution using KANT Controlled Language and the KANTOO multilingual MT system. Our algorithm is based on a robust, syntax-based approach that applies a set of restrictions and preferences to select the correct antecedent. We report a success rate of 93.3% on a training corpus with 286 anaphors, and 88.8% on held-out data with 144 anaphors. Our app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993